Vector-quantization based mask estimation for missing data automatic speech recognition
نویسندگان
چکیده
The application of Missing Data Theory (MDT) has shown to improve the robustness of automatic speech recognition (ASR) systems. A crucial part in a MDT-based recognizer is the computation of the reliability masks from noisy data. To estimate accurate masks in environments with unknown, non-stationary noise statistics only weak assumptions can be made about the noise and we need to rely on a strong model for the speech. In this paper, we present a missing data detector that uses harmonicity in the noisy input signal and a vector quantizer (VQ) to confine speech models to a subspace. The resulting system can deal with additive and convolutional noise and shows promising results on the Aurora4 large vocabulary database.
منابع مشابه
Fuzzy Clustering Approach Using Data Fusion Theory and its Application To Automatic Isolated Word Recognition
In this paper, utilization of clustering algorithms for data fusion in decision level is proposed. The results of automatic isolated word recognition, which are derived from speech spectrograph and Linear Predictive Coding (LPC) analysis, are combined with each other by using fuzzy clustering algorithms, especially fuzzy k-means and fuzzy vector quantization. Experimental results show that the...
متن کاملOn the Relation between Statistical Properties of Spectrographic Masks and Recognition Accuracy
Missing Data Techniques (MDT) can significantly improve the accuracy of automatic speech recognition (ASR) for speech corrupted by background noise. The increase in recognition accuracy obtained using MDT is largely dependent on the estimation of spectrographic masks used to distinguish speech from noise. We present an analysis technique which enables us to compare two mask estimation technique...
متن کاملMask estimation in non-stationary noise environments for missing feature based robust speech recognition
In missing feature based automatic speech recognition (ASR), the role of the spectro-temporal mask in providing an accurate description of the relationship between target speech and environmental noise is critical for minimizing the degradation in ASR word accuracy (WAC) as the signal-to-noise ratio (SNR) decreases. This paper demonstrates the importance of accurate characterization of instanta...
متن کاملNoise Robust Missing Data Mask Estimation Based on Automatically Learned Features
ABSTRACT In this work, we present a missing feature reconstruction based automatic speech recognition (ASR) system in which masks are estimated by binary classification of features generated by GaussianBernoulli restricted Boltzmann machines (GRBMs). The system is evaluated on Track 1 of the 2nd CHiME challenge data. Overall, the best performance is achieved when the reconstructed speech featur...
متن کاملMask estimation and sparse imputation for missing data speech recognition in multisource reverberant environments
This work presents an automatic speech recognition system which uses a missing data approach to compensate for environmental noise. The missing, noise-corrupted components are identified using binaural features or a support vector machine (SVM) classifier. To perform speech recognition using the partially observed data, the missing components are substituted with clean speech estimates calculat...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007